Practical 2A - microbiota analyses
2024 NUTRIM microbiome & metabolome workshop
1 Intro
WORK IN PROGRESS - More restructuring and refinement is planned.
2 Research questions 💪
-
Primary aim:
Does the bacterial gut microbiota composition of IBD-diagnosed patients differ from the control patients?- Diversity: Is richness or diversity associated with IBD diagnosis?
- Composition: Does overall bacterial microbiota composition associated with IBD diagnosis?
- Taxa: Is the relative abundance of specific bacterial taxa (e.g. genera) associated with IBD diagnosis?
-
Secondary aims (see part 2B):
- Activity: Is current disease activity level associated with microbiota diversity, composition, or the relative abundance of specific taxa?
- Medication: Are IBD-related medications associated with microbiota diversity, composition, or the relative abundance of specific taxa?
3 Outline of Part 2A
-
Diversity
- Compare richness and diversity between groups
-
Dissimilarity
- Create (interactive) ordination plots and bar charts
- Compare overall compositions between groups
-
Differential abundance
- Compare relative abundance of individual taxa between groups
4 Load R packages 📦
5 Read phyloseq data
Read the phyloseq we created in part 1.
6 Diversity
Our research questions for this section are:
- How rich and diverse is the gut bacterial microbiota of each patient?
- Does this gut microbiota richness or diversity differ by diagnosis group?
6.1 Richness
- The simplest richness measure is just counting, a.k.a. “Observed Richness”.
- We already computed the observed richness of genera in part 1B.
ps %>%
samdat_tbl() %>%
ggplot(aes(x = N_genera, y = diagnosis, color = diagnosis)) +
geom_boxplot(outliers = FALSE) +
geom_jitter(height = 0.15, alpha = 0.5) +
theme_classic()You can use standard statistical testing on the richness values e.g. linear regression or ANOVA
richness_lm <- lm(data = samdat_tbl(ps), formula = N_genera ~ diagnosis)
anova(richness_lm)Analysis of Variance Table
Response: N_genera
Df Sum Sq Mean Sq F value Pr(>F)
diagnosis 2 1837.3 918.63 7.8329 0.0007448 ***
Residuals 87 10203.2 117.28
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You could do also standard ANOVA post-hoc pairwise comparisons.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = richness_lm)
$diagnosis
diff lwr upr p adj
CD-Other -5.275362 -12.81034 2.259620 0.2227571
UC-Other -10.782946 -17.36256 -4.203332 0.0005347
UC-CD -5.507583 -12.17836 1.163197 0.1261212
Combine group comparison stats and plots in one go with ggstatsplot::ggbetweenstats()
ggbetweenstats(
data = samdat_tbl(ps), x = diagnosis, y = N_genera,
type = "parametric", p.adjust.method = "fdr", var.equal = TRUE,
bf.message = FALSE, results.subtitle = TRUE
)6.2 Diversity
- Remember: a true measure of ecosystem diversity (e.g. Shannon index) will consider the richness and evenness of the ecosystem.
- A rich ecosystem dominated by only one or two of its taxa is still a less diverse ecosystem than one with an even distribution of the same number of taxa.
- We already computed Shannon diversity of genera and the effective Shannon in part 1B.
- Here we will use the effective Shannon diversity \(e^H\) - because of its more intuitive interpretation.
ps %>%
samdat_tbl() %>%
ggplot(aes(x = Effective_Shannon_Genus, y = diagnosis, color = diagnosis)) +
geom_boxplot(outliers = FALSE) +
geom_jitter(height = 0.2) +
theme_classic()eShannon_lm <- lm(data = samdat_tbl(ps), formula = Effective_Shannon_Genus ~ diagnosis)
anova(eShannon_lm)Analysis of Variance Table
Response: Effective_Shannon_Genus
Df Sum Sq Mean Sq F value Pr(>F)
diagnosis 2 158.96 79.482 7.5185 0.0009731 ***
Residuals 87 919.73 10.572
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = eShannon_lm)
$diagnosis
diff lwr upr p adj
CD-Other -1.541564 -3.803828 0.7207006 0.2406931
UC-Other -3.170198 -5.145627 -1.1947686 0.0007080
UC-CD -1.628634 -3.631435 0.3741666 0.1339407
You can get a tidy data frame of results using the broom::tidy function on various statistical model objects.
broom::tidy(eShannon_tukey)# A tibble: 3 × 7
term contrast null.value estimate conf.low conf.high adj.p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 diagnosis CD-Other 0 -1.54 -3.80 0.721 0.241
2 diagnosis UC-Other 0 -3.17 -5.15 -1.19 0.000708
3 diagnosis UC-CD 0 -1.63 -3.63 0.374 0.134
This can be useful when you need to save your model output e.g. to Excel, to format for a table in your article.
broom::tidy(eShannon_tukey) %>% write_xlsx(here("practical-2/test-table.xlsx"))library(ggstatsplot)
ggbetweenstats(
data = samdat_tbl(ps), x = diagnosis, y = Effective_Shannon_Genus,
type = "parametric", p.adjust.method = "fdr", var.equal = TRUE,
bf.message = FALSE, results.subtitle = TRUE
)7 Dissimilarity & Ordination
Our research questions for this section are:
- What gut bacterial microbiota compositions are present or common in this cohort?
- Does the average overall composition of the gut microbiota differ by patient diagnosis group?
7.1 Aggregate ➔ Calculate ➔ Ordinate
In order to create a PCoA ordination - we need to first make two choices:
- At which taxonomic rank will we aggregate the counts? (for 16S data, this is usually Genus)
- Which dissimilarity measure to use when calculating the distance matrix?
7.2 Common dissimilarity measures
You heard about several commonly-used dissimilarity measures in the lecture. In the sections below, we will calculate a distance matrix with each one, and use each to plot a PCoA and test for diagnosis group differences with PERMANOVA.
An unweighted measure - based only on taxon presence or absence.
You must remember to run a “binary” transform on your data before computing “jaccard” distance.
psx_jaccardpsExtra object - a phyloseq object with extra slots:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 178 taxa and 90 samples ]
sample_data() Sample Data: [ 90 samples by 20 sample variables ]
tax_table() Taxonomy Table: [ 178 taxa by 6 taxonomic ranks ]
otu_get(counts = TRUE) [ 178 taxa and 90 samples ]
psExtra info:
tax_agg = "Genus" tax_trans = "binary"
jaccard distance matrix of size 90
0.7173913 0.5588235 0.7241379 0.5483871 0.75 ...
psx_jaccard %>%
ord_calc("PCoA") %>%
ord_plot(colour = "diagnosis") +
stat_ellipse(aes(colour = diagnosis)) +
coord_equal()perm_jaccard <- psx_jaccard %>% dist_permanova(variables = "diagnosis")2024-06-29 10:32:00.172931 - Starting PERMANOVA with 999 perms with 1 processes
2024-06-29 10:32:00.249739 - Finished PERMANOVA
perm_get(perm_jaccard)Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999
vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
Df SumOfSqs R2 F Pr(>F)
diagnosis 2 1.970 0.08987 4.2954 0.001 ***
Residual 87 19.951 0.91013
Total 89 21.921 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Bray-Curtis is an abundance-weighted dissimilarity measure.
It is probably the most commonly used dissimilarity measure in microbiome research.
psx_braypsExtra object - a phyloseq object with extra slots:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 178 taxa and 90 samples ]
sample_data() Sample Data: [ 90 samples by 20 sample variables ]
tax_table() Taxonomy Table: [ 178 taxa by 6 taxonomic ranks ]
psExtra info:
tax_agg = "Genus" tax_trans = "identity"
bray distance matrix of size 90
0.8294664 0.7156324 0.5111652 0.5492537 0.8991251 ...
psx_bray %>%
ord_calc("PCoA") %>%
ord_plot(colour = "diagnosis") +
stat_ellipse(aes(colour = diagnosis)) +
coord_equal()perm_bray <- psx_bray %>% dist_permanova(variables = "diagnosis")2024-06-29 10:35:03.967081 - Starting PERMANOVA with 999 perms with 1 processes
2024-06-29 10:35:04.046079 - Finished PERMANOVA
perm_get(perm_bray)Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999
vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
Df SumOfSqs R2 F Pr(>F)
diagnosis 2 1.5068 0.05441 2.5032 0.001 ***
Residual 87 26.1846 0.94559
Total 89 27.6914 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The phylogenetic distance family - has unweighted, weighted, and generalised versions.
You must use ASV-level data (i.e. no taxonomic aggregation) and have a phylogenetic tree available.
We will not practice with UniFrac distances today, because they can be quite slow to calculate.
The Aitchison distance is a CoDA distance method - named after John Aitchison, a pioneer in the field of Compositional Data analysis. related to CLR + PCA.
psx_aitchisonpsExtra object - a phyloseq object with extra slots:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 178 taxa and 90 samples ]
sample_data() Sample Data: [ 90 samples by 20 sample variables ]
tax_table() Taxonomy Table: [ 178 taxa by 6 taxonomic ranks ]
psExtra info:
tax_agg = "Genus" tax_trans = "identity"
aitchison distance matrix of size 90
28.2123 18.57284 19.08465 21.43771 23.69247 ...
psx_aitchison %>%
ord_calc("PCoA") %>%
ord_plot(colour = "diagnosis") +
stat_ellipse(aes(colour = diagnosis)) +
coord_equal()perm_aitchison <- psx_aitchison %>% dist_permanova(variables = "diagnosis")2024-06-29 10:35:04.924642 - Starting PERMANOVA with 999 perms with 1 processes
2024-06-29 10:35:04.999335 - Finished PERMANOVA
perm_get(perm_aitchison)Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999
vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
Df SumOfSqs R2 F Pr(>F)
diagnosis 2 2552 0.10114 4.8946 0.001 ***
Residual 87 22681 0.89886
Total 89 25233 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- These terms are often used interchangeably
- Strictly: All distances are dissimilarities, but not all dissimilarities are distances.
- A true “distance metric” \(d\), must satisfy 3 properties:
- Identity of indiscernibles: For any samples \(a\) and \(b\), \(d(a, b) = 0\) if and only if \(a = b\)
- Symmetry: For any samples \(a\) and \(b\), \(d(a, b) = d(b, a)\)
- Triangle inequality: For any samples \(a\), \(b\), and \(c\), \(d(a, c) ≤ d(a, b) + d(b, c)\)
- (3) can be interpreted as: “the direct path between two points must be at least as short as any detour”
- (3) is not true for e.g. Bray-Curtis, but in practice it is very rarely problematic.
- The object is now class
psExtra(created by microViz) - A psExtra also stores info about the aggregation and transformations you performed
You can extract the distance matrix with dist_get.
as.matrix(distances)[1:5, 1:5] 099_AX 199_AX 062_BZ 194_AZ 166_AX
099_AX 0.0000000 0.7173913 0.5588235 0.7241379 0.5483871
199_AX 0.7173913 0.0000000 0.7692308 0.8695652 0.6956522
062_BZ 0.5588235 0.7692308 0.0000000 0.6875000 0.5714286
194_AZ 0.7241379 0.8695652 0.6875000 0.0000000 0.5925926
166_AX 0.5483871 0.6956522 0.5714286 0.5925926 0.0000000
The Binary Jaccard dissimilarities range between 0 (identical) and 1 (no shared genera).
7.3 PCoA
Principal Co-ordinates Analysis is one kind of ordination.
- PCoA takes a distance matrix and finds new dimensions (a co-ordinate system, if you like)
- The new dimensions are created with the aim to preserve the original distances between samples
- And to capture the majority of this distance information in the first dimensions
- This makes it easier to visualize the patterns in your data, in 2D or 3D plots 👀
- There is helpful info about ordination methods including PCoA on the GUide to STatistical Analysis in Microbial Ecology website (GUSTA ME). https://sites.google.com/site/mb3gustame/dissimilarity-based-methods/principal-coordinates-analysis
- This website covers a lot of other topics too, which may be interesting for you to read at a later date if you’ll work on microbiome analysis.
7.4 Interactive ordination!
microViz provides a Shiny app ord_explore to interactively create and explore PCoA plots and other ordinations. See the code below to get started. But read the instructions first.
- Colour the samples using the variables in the sample data
- Select a few samples to view their composition on bar charts!
- Change some ordination options:
- Different rank of taxonomic aggregation
- Different distances we’ve discussed
- Copy the automatically generated code
- Exit the app (press escape or click red 🛑 button in R console!)
- Paste and run the code to recreate the ordination plot
- Customise the plot: change colour scheme, title, etc.
- Launch the app again with a different subset of the data
- Practice using
ps_filteretc. - e.g. plot the data of only the UC patients’ gut microbiota!
- compute and colour points by an alpha diversity measure?
- Practice using
- Unblock pop-ups: To allow the interactive analysis window to open in your browser, you may need to unblock pop-ups for your AMI instance address (check for messages about this after running the ord_explore command)
-
Slow UniFrac: UniFrac distances can be quite slow (over a minute) to calculate!
- Filter to fewer samples and fewer taxa to speed it up (Before launching the app)
-
Many other distances: There are many distances available, feel free to try out ones we haven’t talked about
-
BUT:
- You shouldn’t use a distance that you don’t understand in your actual work, even if the plot looks nice! 😉
- A few of the distances might not work…
- They are mostly implemented in the package
veganand I haven’t tested them all - Errors will appear in the RStudio R console
- You can report to me any distances that don’t work (if you’re feeling helpful! 😇)
- They are mostly implemented in the package
-
BUT:
-
Other ordination methods: There are other ordination methods available in
ord_explore- Try out PCA, principal components analysis, which does NOT use distances
- We will not cover constrained and conditioned ordinations
- If you are interested in e.g. RDA, you can look this up later
- See the Guide to Statistical Analysis in Microbial Ecology
# fire up the shiny app
# run these lines in your console (don't keep in script/notebook)
ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
# calculate new sample variables with dominant taxon (optional)
ps_calc_dominant(rank = "Genus", none = "Mixed", other = "Other") %>%
# launch a Shiny app in your web browser!
ord_explore()# Again, with different options
# Run these lines in your console
ps %>%
ps_filter(diagnosis == "UC") %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
# calculate a few sample variables for interest (optional)
ps_calc_dominant(rank = "Genus", none = "Mixed", other = "Other") %>%
ps_calc_diversity(rank = "Genus", index = "shannon") %>%
ps_calc_richness(rank = "Genus", index = "observed") %>%
ord_explore()7.5 PERMANOVA
Permutational multivariate analysis of variance.
- ANOVA - analysis of variance (statistical modelling approach)
- Multivariate - more than one dependent variable (multiple taxa!)
- Permutational - statistical significance estimates obtained by shuffling the data many times
See this excellent book chapter by Marti Anderson on PERMANOVA: https://onlinelibrary.wiley.com/doi/full/10.1002/9781118445112.stat07841
Sometimes PERMANOVA is called NP-MANOVA (non-parametric MANOVA)
e.g. on the GUide to STatistical Analysis in Microbial Ecology website.
TLDR: Are those groups on the PCoA actually different??
ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_agg(rank = "Genus") %>%
dist_calc(dist = "bray") %>%
ord_calc(method = "PCoA") %>%
ord_plot(alpha = 0.6, size = 2, color = "diagnosis") +
theme_classic(12) +
coord_fixed(0.7) +
stat_ellipse(aes(color = diagnosis)) +
scale_color_brewer(palette = "Set1")ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_agg(rank = "Genus") %>%
dist_calc(dist = "bray") %>%
dist_permanova(variables = "diagnosis", n_perms = 99, seed = 123) %>%
perm_get()2024-06-29 10:35:07.015013 - Starting PERMANOVA with 99 perms with 1 processes
2024-06-29 10:35:07.026533 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 99
vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
Df SumOfSqs R2 F Pr(>F)
diagnosis 2 1.5945 0.05962 2.7579 0.01 **
Residual 87 25.1499 0.94038
Total 89 26.7444 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Use more permutations for a more reliable p.value in your real work (slower)
# Set a random seed number for reproducibility of this stochastic methodYou can see from the model output that the p value, Pr(>F) is below 0.05. So there is good statistical evidence that the bacterial gut microbiota composition of C-section delivered infants has a different composition than vaginally delivered infants at 4 days of age.
Reporting PCoA and PERMANOVA methods
- Your methodological choices matter, you should report what you did:
- any relevant rare taxon filtering thresholds
- the taxonomic rank of aggregation
- the dissimilarity measure used to compute the pairwise distances
It’s probably a good idea to decide on a couple of appropriate distance measures up front for these tests, and report both (at least in supplementary material), as the choice of distance measure can affect results and conclusions!
Covariate-adjusted PERMANOVA
You can also adjust for covariates in PERMANOVA, and often should, depending on your study design. Let’s fit a more complex model, adjusting for sex and age.
ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_agg(rank = "Genus") %>%
dist_calc(dist = "bray") %>%
dist_permanova(
variables = c("diagnosis", "gender", "age_years"),
n_perms = 99, seed = 111
) %>%
perm_get()2024-06-29 10:36:56.670887 - Starting PERMANOVA with 99 perms with 1 processes
2024-06-29 10:36:56.723097 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 99
vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
Df SumOfSqs R2 F Pr(>F)
diagnosis 2 1.5442 0.05774 2.6547 0.01 **
gender 1 0.1482 0.00554 0.5095 0.97
age_years 1 0.2689 0.01005 0.9245 0.59
Residual 85 24.7210 0.92434
Total 89 26.7444 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Use more permutations for a more reliable p.value in your real work (it is slower).
Always set a seed number for reproducibility of this random permutation method!
7.6 PCA
- Principal Components Analysis.
- For practical purposes, PCA is quite similar to Principal Co-ordinates Analysis.
- In fact, PCA produces equivalent results to PCoA with Euclidean distances.
- Euclidean distances are essentially a generalization of Pythagoras’ theorem to more dimensions.
- In our data every taxon is a feature, a dimension, on which we calculate Euclidean distances.
Pythagoras’ theorem:
\[c = \sqrt{a^2 + b^2}\]
Euclidean distance:
\[d\left(p, q\right) = \sqrt{\sum _{i=1}^{n_{taxa}} \left( p_{i}-q_{i}\right)^2 }\]
- Distance \(d\) between samples \(p\) and \(q\), with \(n\) taxa.
Problems with PCA (or PCoA with Euclidean Distances) on microbiota data
- These plots look weird! most samples bunch in the middle, with spindly projections..
- Sensitive to sparsity (double-zero problem) –> filter rare taxa
- Excessive emphasis on high-abundance taxa –> log transform features first
Log transformations, and CLR
- First let’s look at the abundance again, this time with heatmaps.
- Each column is a sample (from a child), and each row is a taxon.
ps %>%
tax_sort(by = sum, at = "Genus", trans = "compositional", tree_warn = FALSE) %>%
tax_transform(trans = "compositional", rank = "Genus") %>%
comp_heatmap(samples = 1:20, taxa = 1:20, name = "Proportions", tax_seriation = "Identity")Registered S3 method overwritten by 'seriation':
method from
reorder.hclust vegan
- Even though we have picked the top 20 most abundant genera, there are still a lot of zeros
-
Problem: We need to deal with the zeros, because
log(0)is undefined. - Solution: add a small amount to every value (or just every zero), before applying the log transformation.
- This small value is often called a pseudo-count.
- One easy option is to just add a count of 1
- Another popular option is to add half of the smallest observed real value (from across the whole dataset)
- In general, for zero replacement, keep it simple and record your approach
Centered Log Ratio transformation:
Remember, Microbiome Datasets Are Compositional: And This Is Not Optional.
- The sequencing data gives us relative abundances, not absolute abundances.
- The total number of reads sequenced per sample is an arbitrary total.
This leads to two main types of problem:
- Interpretation caveats: see differential abundance section later
- Statistical issues: taxon abundances are not independent, and may appear negatively correlated
- These issues are worse with simpler ecosystems
Example: If one taxon blooms, the relative abundance of all other taxa will appear to decrease, even if they did not.
There is the same problem in theory with RNAseq data, but I suspect it is less bothersome because there are many more competing “species” of RNA transcript than there are bacterial species in even a very complex microbiome. The centered-log-ratio transformation (along with some other similar ratio transformations) are claimed to help with the statistical issues by transforming the abundances from the simplex to the real space.
TL;DR - the CLR transformation is useful for compositional microbiome data.
- Practically, the CLR transformation involves finding the geometric mean of each sample
- Then dividing abundance of each taxon in that sample by this geometric mean
- Finally, you take the natural log of these ratios
ps %>%
tax_sort(by = sum, at = "Genus", trans = "compositional", tree_warn = FALSE) %>%
tax_agg(rank = "Genus") %>%
tax_transform(trans = "clr", zero_replace = "halfmin", chain = TRUE) %>%
comp_heatmap(
samples = 1:20, taxa = 1:20, grid_lwd = 2, name = "CLR",
colors = heat_palette(sym = TRUE),
tax_seriation = "Identity"
)PCA on CLR-transformed taxa
ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_transform(rank = "Genus", trans = "clr", zero_replace = "halfmin") %>%
ord_calc(method = "PCA") %>%
ord_plot(alpha = 0.6, size = 2, color = "diagnosis") +
theme_classic(12) +
coord_fixed(0.7)- After the CLR transformation, the plot looks better
- We can see a pattern where the gut microbiomes of infants cluster by birth mode
So why is PCA interesting for us?
Principal components are built directly from a (linear) combination of the original features.
That means we know how much each taxon contributes to each PC dimension
We can plot this information (loadings) as arrows, alongside the sample points
pca <- ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_transform(rank = "Genus", trans = "clr", zero_replace = "halfmin") %>%
ord_calc(method = "PCA") %>%
ord_plot(
alpha = 0.6, size = 2, color = "diagnosis",
plot_taxa = 1:6, tax_vec_length = 0.5,
tax_lab_style = tax_lab_style(
type = "text", max_angle = 90, aspect_ratio = 1,
size = 3, fontface = "bold"
),
) +
theme_classic(12) +
coord_fixed(ratio = 1, xlim = c(-3, 3), ylim = c(-3, 3), clip = "off")
pcaInterestingly, samples on the right of the plot (which tend to be UC patients) seem to have relatively more Escherichia/Shigella, and less Blautia, Faecalibacterium and Roseburia.
Cautiously
- There are caveats and nuance to the interpretation of these plots, which are called PCA bi-plots
- You can read more here: https://sites.google.com/site/mb3gustame/indirect-gradient-analysis/principal-components-analysis
In general:
The relative length and direction of an arrow indicates how much that taxon contributes to the variation on each visible PC axis, e.g. Variation in Faecalibacterium abundance contributes quite a lot to variation along the PC1 axis.
The direction allows you to infer that samples positioned towards the left of the plot will tend to have higher relative abundance of Faecalibacterium than samples on the right of the plot.
Bacteroides variation contributes to both PC1 and PC2, as indicate by its high (negative) values on both axes.
We can make another kind of bar plot, using the PCA information to order our samples in a circular layout.
iris <- ps %>%
tax_filter(min_prevalence = 2, verbose = FALSE) %>%
tax_transform(rank = "Genus", trans = "clr", zero_replace = "halfmin") %>%
ord_calc(method = "PCA") %>%
ord_plot_iris(
tax_level = "Genus", n_taxa = 12, other = "Other",
anno_colour = "diagnosis",
anno_colour_style = list(alpha = 0.6, size = 0.6, show.legend = FALSE)
)patchwork::wrap_plots(pca, iris, nrow = 1, guides = "collect")8 More bar charts
- Stacked compositional bar charts using the
microVizpackage,comp_barplot()function. - Let’s start with a smaller subset of the data, just the control group.
We can filter the samples like this, using the sample_data information
phyloseq-class experiment-level object
otu_table() OTU Table: [ 15833 taxa and 24 samples ]
sample_data() Sample Data: [ 24 samples by 20 sample variables ]
tax_table() Taxonomy Table: [ 15833 taxa by 6 taxonomic ranks ]
If you forget a variable’s levels, check with table() or unique()
ps %>%
ps_filter(case_control == "Control") %>%
comp_barplot("Family", n_taxa = 10, merge_other = FALSE) +
coord_flip() +
ggtitle("Control participants")8.1 Organising your bar charts
- Let’s look at all of the data from all diagnosis groups.
- We can add the ggplot2
facet_wrap()to our plot, to separate the groups.
ps %>%
comp_barplot("Family", n_taxa = 10, merge_other = FALSE) +
facet_wrap(facets = vars(diagnosis), scales = "free") +
coord_flip() More examples/tutorial of visualizing microbiota compositions using stacked bar charts can be found here: https://david-barnett.github.io/microViz/articles/web-only/compositions.html
9 Taxon stats
From the PCA loadings and barplots below, we have some strong suspicions about which taxa have a higher relative abundance in vaginally delivered infants than in c-section delivered infants, and vice versa, but we can also statistically test this. This is often called “differential abundance” (DA) testing, in the style of “differential expression” (DE) testing from the transcriptomics field.
ps %>%
comp_barplot(
tax_level = "Genus", n_taxa = 12, facet_by = "diagnosis", label = NULL,
) +
coord_flip() +
theme(axis.ticks.y = element_blank())9.1 Model one taxon
- We will start by creating a linear regression model for one genus, Bacteroides.
- We will transform the count data by first making it proportions, and then taking a base 2 logarithm, log2, after adding a pseudocount.
bacteroidesRegression1 <- ps %>%
tax_transform("compositional", rank = "Genus") %>%
tax_model(
type = "lm", rank = "Genus",
trans = "log2", trans_args = list(zero_replace = "halfmin"),
taxa = "Bacteroides", variables = "case_control",
return_psx = FALSE
) %>%
pluck(1)Modelling: Bacteroides
- Looking at the regression results
summary(bacteroidesRegression1)
Call:
Bacteroides ~ case_control
Residuals:
Min 1Q Median 3Q Max
-10.4529 -0.1391 1.8445 2.9517 4.5866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.8342 0.5466 -8.844 8.58e-14 ***
case_controlControl 1.7751 1.0584 1.677 0.0971 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.44 on 88 degrees of freedom
Multiple R-squared: 0.03097, Adjusted R-squared: 0.01996
F-statistic: 2.812 on 1 and 88 DF, p-value: 0.09708
broom::tidy(bacteroidesRegression1, conf.int = TRUE)# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -4.83 0.547 -8.84 8.58e-14 -5.92 -3.75
2 case_controlControl 1.78 1.06 1.68 9.71e- 2 -0.328 3.88
9.2 Covariate-adjusted model
We can fit a model with covariates, as we did for PERMANOVA
- We will convert the categorical variables into indicator (dummy) variables
- We will scale the continuous covariates to 0 mean and SD 1 (z-scores)
- You’ll see this will make our subsequent plots easier to interpret later
bacteroidesRegression2 <- ps %>%
tax_transform("compositional", rank = "Genus") %>%
tax_model(
type = "lm", rank = "Genus", taxa = "Bacteroides",
trans = "log2", trans_args = list(zero_replace = "halfmin"),
variables = c("IBD", "Female", "Age_Z"),
return_psx = FALSE
) %>%
pluck(1)Modelling: Bacteroides
- Looking at the regression results
summary(bacteroidesRegression2)
Call:
Bacteroides ~ IBD + Female + Age_Z
Residuals:
Min 1Q Median 3Q Max
-10.6693 -0.2239 1.6678 3.0247 4.8986
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.7386 1.1678 -2.345 0.0213 *
IBD -1.7293 1.2122 -1.427 0.1573
Female -0.6928 0.9538 -0.726 0.4696
Age_Z -0.1101 0.5381 -0.205 0.8383
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.476 on 86 degrees of freedom
Multiple R-squared: 0.03782, Adjusted R-squared: 0.004253
F-statistic: 1.127 on 3 and 86 DF, p-value: 0.3429
broom::tidy(bacteroidesRegression2, conf.int = TRUE)# A tibble: 4 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -2.74 1.17 -2.35 0.0213 -5.06 -0.417
2 IBD -1.73 1.21 -1.43 0.157 -4.14 0.681
3 Female -0.693 0.954 -0.726 0.470 -2.59 1.20
4 Age_Z -0.110 0.538 -0.205 0.838 -1.18 0.959
9.3 There are many DA methods!
- This method simple method is borrowed from MaAsLin2
- Note: they call the compositional transformation “Total Sum Scaling (TSS)”)
- This is quite a straightforward method, so we will stick with this for today
- But, many statistical methods have been developed for differential abundance analyses
Microbiome abundance data are quite awkward, statistically speaking, due to their sparseness and compositionality. Each successive method claims to handle some aspect of this awkwardness “better” than previous methods.
The aim is to have a method with adequate power to detect true associations, whilst controlling the type 1 error rate, the “false positive” reporting of associations that are not “truly” present.
Results are surprisingly inconsistent across the different methods, as demonstrated this year in a fascinating analysis by Jacob Nearing and colleagues.
So, what to do?
- Filter out the noise & interpret results with caution! use multiple testing corrections
- Remember it’s all relative (abundance)
- Try 2 or 3 methods and/or use same method as a previous study if replicating
- Avoid Lefse and edgeR?
- Beware: Not all methods allow covariate adjustment & few allow random effects (for time-series)
9.4 Now model all the taxa!?
- We’re not normally interested in just one taxon!
- It’s also hard to decide which taxonomic rank we are most interested in modelling!
- Lower ranks like species or ASVs give better resolution but also more sparsity and classification uncertainty…
- Higher ranks e.g. classes, could also be more powerful if you think most taxa within that class will follow a similar pattern.
- So now we will fit a similar model for almost every taxon* at every rank from phylum to genus
- *We’ll filter out species with a prevalence of less than 10%
# The code for `taxatree_models` is quite similar to tax_model.
# However, you might need to run `tax_prepend_ranks` to ensure that each taxon at each rank is always unique.
psModels <- ps %>%
tax_prepend_ranks() %>%
tax_transform("compositional", rank = "Genus", keep_counts = TRUE) %>%
tax_filter(min_prevalence = 0.1, undetected = 0, use_counts = TRUE) %>%
taxatree_models(
type = lm,
trans = "log2", trans_args = list(zero_replace = "halfmin"),
ranks = c("Phylum", "Class", "Order", "Family", "Genus"),
variables = c("IBD", "Female", "Age_Z")
)psModelspsExtra object - a phyloseq object with extra slots:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 65 taxa and 90 samples ]
sample_data() Sample Data: [ 90 samples by 23 sample variables ]
tax_table() Taxonomy Table: [ 65 taxa by 6 taxonomic ranks ]
otu_get(counts = TRUE) [ 65 taxa and 90 samples ]
psExtra info:
tax_agg = "Genus" tax_trans = "compositional"
taxatree_models list:
Ranks: Phylum/Class/Order/Family/Genus
Why filter the taxa? It’s less likely that we are interested in rare taxa, and models of rare taxon abundances are more likely to be unreliable. Reducing the the number of taxa modelled also makes the process faster and makes visualizing the results easier!
We probably want to filter out rare taxa, before performing some kinds of analysis.
Why remove rare taxa?
Rare taxa might sometimes be:
- Sequencing errors
- Statistically problematic
- Biologically irrelevant
How to remove rare taxa?
What is rare? Two main concepts.
- Low prevalence - taxon only detected in a small number of samples in your dataset.
- Low abundance - relatively few reads assigned to that taxon (on average or in total)
Considering the impact of issues 1, 2, and 3, let’s say we are not interested in Species that occur in fewer than 2% of samples, and they have to have at least 10,000 reads in total across all samples.
ntaxa(ps) # before filtering[1] 36349
ps %>%
tax_filter(min_prevalence = 2, min_total_abundance = 100) %>%
ntaxa() # after filtering[1] 299
- Wow so that would remove most of our unique taxa!
- What is going on? Let’s make some plots!
# make table of summary statistics for the unique taxa in shao19
psTaxaStats <- tibble(
taxon = taxa_names(ps),
prevalence = microbiome::prevalence(ps),
total_abundance = taxa_sums(ps)
)Some ggplot2 code
p <- psTaxaStats %>%
ggplot(aes(total_abundance, prevalence)) +
geom_point(alpha = 0.5) +
geom_rug(alpha = 0.1) +
scale_x_continuous(labels = scales::label_number(), name = "Total Abundance") +
scale_y_continuous(
labels = scales::label_percent(), breaks = scales::breaks_pretty(n = 9),
name = "Prevalence (%)",
sec.axis = sec_axis(
trans = ~ . * nsamples(ps), breaks = scales::breaks_pretty(n = 9),
name = "Prevalence (N samples)"
)
) +
theme_bw()Warning: The `trans` argument of `sec_axis()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.
Some ggplot2 code
pSo most taxa have a low prevalence, and handful have way more reads than most.
Let’s label those points to check which taxa are the big time players.
p + ggrepel::geom_text_repel(
data = function(df) filter(df, total_abundance > 1500 | prevalence > 0.6),
mapping = aes(label = taxon), size = 2.5, min.segment.length = 0,
force = 20, nudge_y = 0.05
)Those taxa make sense for this dataset of gut microbiota samples.
Now let’s zoom in on the less abundant taxa by log-transforming the axes. We’ll also add lines indicating the thresholds of 2% prevalence and 1000 reads abundance.
Some more ggplot2 code
psTaxaStats %>%
ggplot(aes(x = total_abundance, y = prevalence)) +
geom_vline(xintercept = 10, color = "red", linetype = "dotted") +
geom_hline(yintercept = 2 / 100, color = "red", linetype = "dotted") +
geom_point(alpha = 0.5) +
geom_rug(alpha = 0.1) +
scale_x_log10(labels = scales::label_number(), name = "Total Abundance") +
scale_y_log10(
labels = scales::label_percent(), breaks = scales::breaks_log(n = 9),
name = "Prevalence (%)",
sec.axis = sec_axis(
trans = ~ . * nsamples(ps), breaks = scales::breaks_log(n = 9),
name = "Prevalence (N samples)"
)
) +
theme_bw()Warning in scale_x_log10(labels = scales::label_number(), name = "Total
Abundance"): log-10 transformation introduced infinite values.
Warning in scale_y_log10(labels = scales::label_percent(), breaks =
scales::breaks_log(n = 9), : log-10 transformation introduced infinite values.
Warning in scale_x_log10(labels = scales::label_number(), name = "Total
Abundance"): log-10 transformation introduced infinite values.
Warning in scale_y_log10(labels = scales::label_percent(), breaks =
scales::breaks_log(n = 9), : log-10 transformation introduced infinite values.
- We can break this down by phylum if we add the taxonomic table information.
A lot more ggplot2 code!
# don't worry about this code if it's confusing, just focus on the plot output
ps %>%
tax_table() %>%
as.data.frame() %>%
as_tibble(rownames = "taxon") %>%
left_join(psTaxaStats, by = "taxon") %>%
add_count(Phylum, name = "phylum_count", sort = TRUE) %>%
mutate(Phylum = factor(Phylum, levels = unique(Phylum))) %>% # to fix facet order
mutate(Phylum = forcats::fct_lump_n(Phylum, n = 5)) %>%
mutate(Phylum = forcats::fct_explicit_na(Phylum, na_level = "Other")) %>%
ggplot(aes(total_abundance, prevalence)) +
geom_vline(xintercept = 10, color = "red", linetype = "dotted") +
geom_hline(yintercept = 2 / 100, color = "red", linetype = "dotted") +
geom_point(alpha = 0.5, size = 1) +
geom_rug(alpha = 0.2) +
scale_x_log10(
labels = scales::label_log(), breaks = scales::breaks_log(n = 5),
name = "Total Abundance"
) +
scale_y_log10(
labels = scales::label_percent(), breaks = scales::breaks_log(n = 9),
name = "Prevalence (%)",
sec.axis = sec_axis(
trans = ~ . * nsamples(shao19), breaks = scales::breaks_log(n = 9),
name = "Prevalence (N samples)"
)
) +
facet_wrap("Phylum") +
theme_bw(10)Warning: There was 1 warning in `mutate()`.
ℹ In argument: `Phylum = forcats::fct_explicit_na(Phylum, na_level = "Other")`.
Caused by warning:
! `fct_explicit_na()` was deprecated in forcats 1.0.0.
ℹ Please use `fct_na_value_to_level()` instead.
Warning in scale_x_log10(labels = scales::label_log(), breaks =
scales::breaks_log(n = 5), : log-10 transformation introduced infinite values.
Warning in scale_y_log10(labels = scales::label_percent(), breaks =
scales::breaks_log(n = 9), : log-10 transformation introduced infinite values.
Warning in scale_x_log10(labels = scales::label_log(), breaks =
scales::breaks_log(n = 5), : log-10 transformation introduced infinite values.
Warning in scale_y_log10(labels = scales::label_percent(), breaks =
scales::breaks_log(n = 9), : log-10 transformation introduced infinite values.
How you pick a threshold, depends on what analysis method you are filtering for!
- alpha diversity: do not filter
- beta diversity: relevance of threshold depends on your distance measure
- differential abundance testing: stringent filtering, prevalence >5%, >10%?
Getting stats from the models
Next we will get a data.frame containing the regression coefficient estimates, test statistics and corresponding p values from all these regression models.
psStats <- taxatree_models2stats(psModels)
psStatspsExtra object - a phyloseq object with extra slots:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 65 taxa and 90 samples ]
sample_data() Sample Data: [ 90 samples by 23 sample variables ]
tax_table() Taxonomy Table: [ 65 taxa by 6 taxonomic ranks ]
otu_get(counts = TRUE) [ 65 taxa and 90 samples ]
psExtra info:
tax_agg = "Genus" tax_trans = "compositional"
taxatree_stats dataframe:
126 taxa at 5 ranks: Phylum, Class, Order, Family, Genus
3 terms: IBD, Female, Age_Z
psStats %>% taxatree_stats_get()# A tibble: 378 × 8
term taxon rank formula estimate std.error statistic p.value
<fct> <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 IBD P: Firmicutes Phylum `P: Fir… -0.626 0.332 -1.89 6.25e-2
2 Female P: Firmicutes Phylum `P: Fir… -0.391 0.261 -1.50 1.37e-1
3 Age_Z P: Firmicutes Phylum `P: Fir… 0.151 0.147 1.02 3.09e-1
4 IBD P: Proteobacteria Phylum `P: Pro… 4.20 0.950 4.42 2.89e-5
5 Female P: Proteobacteria Phylum `P: Pro… 0.778 0.747 1.04 3.01e-1
6 Age_Z P: Proteobacteria Phylum `P: Pro… -0.568 0.422 -1.35 1.81e-1
7 IBD P: Bacteroidetes Phylum `P: Bac… -1.75 1.17 -1.49 1.39e-1
8 Female P: Bacteroidetes Phylum `P: Bac… -0.802 0.922 -0.870 3.87e-1
9 Age_Z P: Bacteroidetes Phylum `P: Bac… -0.181 0.520 -0.348 7.29e-1
10 IBD P: Actinobacteria Phylum `P: Act… -1.67 0.916 -1.82 7.18e-2
# ℹ 368 more rows
Adjusting p values
We have performed a lot of statistical tests here!
It is likely that we could find some significant p-values by chance alone.
We should correct for multiple testing / control the false discovery rate or family-wise error rate.
Instead of applying these adjustment methods across all taxa models at all ranks, the default behaviour is to control the family-wise error rate per taxonomic rank.
psStats <- psStats %>% taxatree_stats_p_adjust(method = "BH", grouping = "rank")
# notice the new variable
psStats %>% taxatree_stats_get()# A tibble: 378 × 9
# Groups: rank [5]
term taxon rank formula estimate std.error statistic p.value p.adj.BH.rank
<fct> <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IBD P: F… Phyl… `P: Fi… -0.626 0.332 -1.89 6.25e-2 0.188
2 Female P: F… Phyl… `P: Fi… -0.391 0.261 -1.50 1.37e-1 0.291
3 Age_Z P: F… Phyl… `P: Fi… 0.151 0.147 1.02 3.09e-1 0.496
4 IBD P: P… Phyl… `P: Pr… 4.20 0.950 4.42 2.89e-5 0.000607
5 Female P: P… Phyl… `P: Pr… 0.778 0.747 1.04 3.01e-1 0.496
6 Age_Z P: P… Phyl… `P: Pr… -0.568 0.422 -1.35 1.81e-1 0.346
7 IBD P: B… Phyl… `P: Ba… -1.75 1.17 -1.49 1.39e-1 0.291
8 Female P: B… Phyl… `P: Ba… -0.802 0.922 -0.870 3.87e-1 0.496
9 Age_Z P: B… Phyl… `P: Ba… -0.181 0.520 -0.348 7.29e-1 0.729
10 IBD P: A… Phyl… `P: Ac… -1.67 0.916 -1.82 7.18e-2 0.188
# ℹ 368 more rows
9.5 Plot all the taxatree_stats!
-
taxatree_plots()allows you to plot statistics from all of the taxa models onto a tree layout (e.g. point estimates and significance). - The taxon model results are organised by rank, radiating out from the central root node
- e.g. from Phyla around the center to Genus in the outermost ring.
taxatree_plots() itself returns a list of plots, which you can arrange into one figure with the patchwork package for example (and/or cowplot).
psStats %>%
taxatree_plots(node_size_range = c(1, 3), sig_stat = "p.adj.BH.rank") %>%
patchwork::wrap_plots(ncol = 2, guides = "collect")Taxatree Key
But how do we know which taxa are which nodes? We can create a labelled grey tree with taxatree_plotkey(). This labels only some of the taxa based on certain conditions that we specify.
set.seed(123) # label position
key <- psStats %>%
taxatree_plotkey(
taxon_renamer = function(x) stringr::str_remove(x, "[PFG]: "),
# conditions below, for filtering taxa to be labelled
rank == "Phylum" | rank == "Genus" & prevalence > 0.2
# all phyla are labelled, and all genera with a prevalence of over 0.2
)
keyYou can do more with these trees to customise them to your liking. See an extended tutorial here on the microViz website: including how to directly label taxa on the colored plots, change the layout and style of the trees, and even how to use a different regression modelling approach.
# try it out!10 Next! ⏩
11 Session info
sessioninfo::session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.5
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Amsterdam
date 2024-06-29
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
ade4 1.7-22 2023-02-06 [1] CRAN (R 4.4.0)
ape 5.8 2024-04-11 [1] CRAN (R 4.4.0)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
bayestestR 0.13.2 2024-02-12 [1] RSPM (R 4.4.0)
Biobase 2.64.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
BiocGenerics 0.50.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
biomformat 1.32.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
Biostrings 2.72.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
broom * 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
ca 0.71.1 2020-01-24 [1] CRAN (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
Cairo 1.6-2 2023-11-28 [1] CRAN (R 4.4.0)
circlize 0.4.16 2024-02-20 [1] CRAN (R 4.4.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0)
clue 0.3-65 2023-09-23 [1] CRAN (R 4.4.0)
cluster 2.1.6 2023-12-01 [1] CRAN (R 4.4.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.0)
ComplexHeatmap 2.20.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
corncob 0.4.1 2024-01-10 [1] CRAN (R 4.4.0)
correlation 0.8.5 2024-06-16 [1] RSPM (R 4.4.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.4.0)
data.table 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
datawizard 0.11.0 2024-06-05 [1] RSPM (R 4.4.0)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0)
doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.4.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
effectsize 0.8.8 2024-05-12 [1] RSPM (R 4.4.0)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.4.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
GenomeInfoDb 1.40.1 2024-05-24 [1] Bioconductor 3.19 (R 4.4.0)
GenomeInfoDbData 1.2.12 2024-05-26 [1] Bioconductor
GetoptLong 1.0.5 2020-12-15 [1] CRAN (R 4.4.0)
ggforce 0.4.2 2024-02-19 [1] CRAN (R 4.4.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
ggraph 2.2.1 2024-03-07 [1] CRAN (R 4.4.0)
ggrepel 0.9.5 2024-01-10 [1] CRAN (R 4.4.0)
ggstatsplot * 0.12.3 2024-04-06 [1] RSPM (R 4.4.0)
GlobalOptions 0.1.2 2020-06-10 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
graphlayouts 1.1.1 2024-03-09 [1] CRAN (R 4.4.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.4.0)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.0)
igraph 2.0.3 2024-03-13 [1] CRAN (R 4.4.0)
insight 0.20.1 2024-06-11 [1] RSPM (R 4.4.0)
IRanges 2.38.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
knitr 1.47 2024-05-29 [1] CRAN (R 4.4.0)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
MASS 7.3-60.2 2024-04-24 [1] local
Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0)
matrixStats 1.3.0 2024-04-11 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.4.0)
microbiome 1.26.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
microViz * 0.12.3 2024-06-06 [1] local
multtest 2.60.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
nlme 3.1-164 2023-11-27 [1] CRAN (R 4.4.0)
paletteer 1.6.0 2024-01-21 [1] RSPM (R 4.4.0)
parameters 0.21.7 2024-05-14 [1] RSPM (R 4.4.0)
patchwork 1.2.0 2024-01-08 [1] RSPM (R 4.4.0)
permute 0.9-7 2022-01-27 [1] CRAN (R 4.4.0)
phyloseq * 1.48.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
png 0.1-8 2022-11-29 [1] CRAN (R 4.4.0)
polyclip 1.10-6 2023-09-27 [1] CRAN (R 4.4.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
registry 0.5-1 2019-03-05 [1] CRAN (R 4.4.0)
rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.4.0)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.4.0)
rhdf5 2.48.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
rhdf5filters 1.16.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
Rhdf5lib 1.26.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
rjson 0.2.21 2022-01-09 [1] CRAN (R 4.4.0)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
Rtsne 0.17 2023-12-07 [1] CRAN (R 4.4.0)
S4Vectors 0.42.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
seriation 1.5.5 2024-04-17 [1] CRAN (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.4.0)
statsExpressions 1.5.4 2024-03-20 [1] RSPM (R 4.4.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
survival 3.5-8 2024-02-14 [1] CRAN (R 4.4.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
tidygraph 1.3.1 2024-01-30 [1] CRAN (R 4.4.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
TSP 1.2-4 2023-04-04 [1] CRAN (R 4.4.0)
tweenr 2.0.3 2024-02-26 [1] CRAN (R 4.4.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
UCSC.utils 1.0.0 2024-05-06 [1] Bioconductor 3.19 (R 4.4.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
vegan 2.6-6.1 2024-05-21 [1] CRAN (R 4.4.0)
viridis 0.6.5 2024-01-29 [1] CRAN (R 4.4.0)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
writexl * 1.5.0 2024-02-09 [1] CRAN (R 4.4.0)
xfun 0.44 2024-05-15 [1] CRAN (R 4.4.0)
XVector 0.44.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.4.0)
zeallot 0.1.0 2018-01-28 [1] RSPM (R 4.4.0)
zlibbioc 1.50.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────